What is claimed is: 


10 


A method of indexing a database of documents, comprising: 
providing a vocabulary of n terms; 

indexing the database in the form of a non-negative n x m index matrix V, 


wherein: 


m isSequal to the number of documents in the database; 
n is equal to the number of terms used to represent the database; and 
the value of each element^ of index matrix V is a function of the number of occurrences 
of the i* vocabulary term in\he j* document; 

factoring out nof^negative matrix factors Tand D such that 


V*TD; and 


nm/(n+m). 


wherein T is an n x r term matrix, D is an r x m document matrix, and r < 


2. The method of claim 1 furthe\comprising deleting said index matrix V. 
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3. The method of claim 2 further comprising deleting said term matrix T. 


4. The method of claim 1 wherein r is\at least one order of magnitude 
smaller than n. 
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5. The method of claim 1 wherein r is from two to three orders of magnitude 
smallerdhan n. 

6. \ The method of claim 1 wherein entries of said document matrix D falling 
below a predetermined threshold value / are set to zero. 

7. The method of claim 2 wherein r is at least one order of magnitude 
smaller than n. 

8. The method ofylaim 2 wherein r is from two to three orders of magnitude 
smaller than n. 

9. The method of claim 2\wherein entries of said document matrix D falling 
below a predetermined threshold value /\re set to zero. 

10. The method of claim 3 whef^in r is at least one order of magnitude 
smaller than n. 

1 1 . The method of claim 3 wherein r is fr^i two to three orders of magnitude 
smaller than n. 


12. The method of claim 3 wherein entries of sai^ document matrix D falling 
below a predetermined threshold value / are set to zero. 
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13. The method of claim 1 wherein said factoring out of non-negative matrix 
factors 7\nd D further comprises: 

selecting a cost function and associated update rules from the group: 

function F = Z iK lo A TD )y -( TD \] associated with 

V.- T ik 

update rules r * <~ T ^J^ D * , ^ *~ , and ^ <- z>,£r„ , 


cost functionV ^ = X Z 


/=1 7=1 


associated with 


I 


update rules 


D u '±_ ,,J and T a +-T ik 1 »and 


2>* 


cost function - 7X>|| =^2v« ~( TD \f associated with update 

rules D, <- D, j- an d T , <- T lk ; and 

iteratively calculating said update Aales so as to converge said cost 
function toward a limit until the distance between V ai^d TD is reduced to or beyond a 
desired value. 


14. A program storage device readable by machinfe, tangibly embodying a 
program of instructions executable by the machine to perform method steps for indexing 
a database of documents, said method steps comprising: 
providing a vocabulary of n terms; 
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\ indexing the database in the form of a non-negative nxm index matrix V, 

whereim 

\ m is equal to the number of documents in the database; 

\ n is equal to the number of terms used to represent the database; and 
the value of each element vy of index matrix V is a function of the number 
of occurrences of the i* vocabulary term in the j* document; 

factoring out non-negative matrix factors 7 and D such that 
V s * TD; and 

wherein T iss&n nxr term matrix, D is an r x m document matrix, and r < 

15. A database index, comprising : 

an r x m document matrix D, such that 

VxTD \ 
wherein T is an n x r term matrix; 

V is a non-negative nxm index^matrix , wherein each of its m columns 
represents an document having n entries containing the value of a function of the 
number of occurrences of a I th term appearing in said }t document; and 

wherein T and D are non-negative matrimactors of V and r < nml{n+m)\ 
and \ 

wherein each of the m columns of said document matrix D corresponds to 
said j* document. \ 
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16. A method of information retrieval, comprising: 

providing a query comprising a plurality of search terms; 
providing a vocabulary of n terms; 

performing a first pass retrieval through a first database representation and 
scoring m retrieved documents according to relevance to said query; 

e^cuting a second pass retrieval through a second database representation 
and scoring documents retrieved from said first pass retrieval so as to generate a final 
relevancy score for each document; and 

wherein s\d second database representation comprises anrx/w document 
matrix D, such that 

V*TD 

wherein T is an n \r term matrix; 
V is a non-negative \ x m index matrix , wherein each of its m columns 
represents an j* document having wVntries containing the value of a function of the 
number of occurrences of a i* term of said vocabulary appearing in said j* document; and 
wherein T and D are non-n\gative matrix factors of V and r < nm/(n+m); 

and 

wherein each of the m columns 0\f said document matrix D corresponds to 
saidj 111 document. 


17. The method of claim 16 wherein said\final relevancy score for any j* 
document is a function of said j* document s corresponding entry in said document 
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mafaix D and the corresponding entries in said document matrix D of the T top-scoring 
documents from said first pass retrieval. 

18. V The method of claim 17 wherein said relevancy score function for said j* 
document is proportional to a sum of cosine distances between said j* document s 
corresponding emsry in said document matrix D and each of said corresponding entries in 
said document matriVD of the T top- scoring documents from said first pass retrieval. 

19. The methodof claim 16 wherein r is at least one order of magnitude 
smaller than «. \ 

20. The method of claim 16 wherein r is from two to three orders of 
magnitude smaller than n. \ 

21. The method of claim 16 wherein entries of said document matrix D falling 
below a predetermined threshold value t are s& to zero. 

22. A program storage device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform method steps for 
information retrieval, said method steps comprising: \ 

providing a query comprising a plurality of\earch terms; 
providing a vocabulary of n terms; \ 
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performing a first pass retrieval through a first database representation and 


scoringVw retrieved documents according to relevance to said query; 

\ executing a second pass retrieval through a second database representation 
and scoring documents retrieved from said first pass retrieval so as to generate a final 
relevancy score ibr each document; and 

wherfein said second database representation comprises an r x m document 
matrix D, such that \ 

\ V*TD 
wherein T is anto x r term matrix; 

V is a non-negative n x m index matrix , wherein each of its m columns 
represents an document havingNa entries containing the value of a function of the 
number of occurrences of a i* term of siaid vocabulary appearing in said document; and 

wherein T and D are non-negative matrix factors of V and r < nm/(n+m); 
and \ 

wherein each of the m columns Vf said document matrix D corresponds to 
said document. 
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